Checkpoints V2: add migration option#855
Merged
computermode merged 31 commits intomainfrom Apr 8, 2026
Merged
Conversation
Entire-Checkpoint: 079c1c0e0eeb
Pre-session dirty files (CLI config files from `entire enable`, leftover changes from previous sessions) were incorrectly counted as human contributions, deflating agent percentage. Root cause: PA1 (first prompt attribution) captures worktree state at session start. This data was used to correct agent line counts (correct) but also added to human contributions (wrong). Fix: - Split prompt attributions into baseline (PA1) and session (PA2+) - PA1 data still subtracted from agent work (correct agent calc) - PA1 contributions excluded from relevantAccumulatedUser - PA1 removals excluded from totalUserRemoved - Include PendingPromptAttribution during condensation for agents that skip SaveStep (e.g., Codex mid-turn commits) - Add .entire/ filter to attribution calc (matches existing PA filter) - Fix wrapcheck lint errors in updateCombinedAttributionForCheckpoint Verified end-to-end: 100% agent with config files committed alongside. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: b0cb4216f6bc
…ibution Checkpoint package changes required by the attribution baseline fix: - PromptAttributionsJSON field on WriteCommittedOptions and CommittedMetadata - UpdateCheckpointSummary method on GitStore for multi-session aggregation - CombinedAttribution field on CheckpointSummary - Preserve existing CombinedAttribution during summary rewrites Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: b8963737336c
…arentCommitHash Fixes all 4 issues from Copilot and Cursor Bugbot review: 1. Precompute parentCommitHash on postCommitActionHandler struct using ParentHashes[0] (avoids extra object read, no silent error) 2. Remove duplicated 6-line parentCommitHash computation from HandleCondense and HandleCondenseIfFilesTouched 3. Thread parentTree through condenseOpts/attributionOpts and use it for non-agent file line counting — ensures diffLines uses parent→HEAD (consistent with parentCommitHash file scoping) instead of sessionBase→HEAD which over-counted intermediate commit changes 4. Add ParentTreeForNonAgentLines test proving the fix (TDD verified: HumanAdded=8 without fix → HumanAdded=3 with fix) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 12f5c4373467
Three fixes for multi-session attribution: 1. Cross-session file exclusion: Thread allAgentFiles (union of all sessions' FilesTouched) through the attribution pipeline. Files created by other agent sessions are no longer counted as human work. 2. Exclude .entire/ from commit session fallback: When the commit session has no FilesTouched and falls back to all committed files, filter out .entire/ metadata created by `entire enable`. 3. PA1 baseline uses base tree for new sessions: New sessions (StepCount == 0) always diff against the base commit tree, not the shared shadow branch which may contain other sessions' state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 209a37190167
Contributor
There was a problem hiding this comment.
Pull request overview
Adds an initial entire migrate CLI command intended to migrate v1 checkpoints to the v2 checkpoint ref/layout for testing and rollout prep.
Changes:
- Registers a new
migratesubcommand on the root CLI. - Introduces v1→v2 checkpoint migration logic, including transcript compaction and attempted task-metadata tree copying.
- Adds unit tests covering basic migration flows and idempotency.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| cmd/entire/cli/root.go | Registers the new migrate command in the root CLI. |
| cmd/entire/cli/migrate.go | Implements v1→v2 migration logic, transcript compaction, and task metadata tree splicing. |
| cmd/entire/cli/migrate_test.go | Adds tests for migration behavior (basic/idempotent/multi-session/flag validation). |
Entire-Checkpoint: 573a97ec8d2c
Entire-Checkpoint: 3790cba265e6
Entire-Checkpoint: c9595c52ab4a
Entire-Checkpoint: 9f07aeebbf93
Entire-Checkpoint: f1c37c8efc47
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tering - Test AllAgentFiles cross-session exclusion in CalculateAttributionWithAccumulated - Test committedFilesExcludingMetadata filters .entire/ paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The combined_attribution field now diffs parent→HEAD once and classifies files as agent vs human based on the union of sessions with real checkpoints (SaveStep ran). Filters .entire/ and .claude/ config paths. Also adds ReadSessionMetadata for lightweight per-session metadata reads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mmit-inflation Fix attribution inflation from intermediate commits
don't show multiple spaces for codex single line start message rendering
Entire-Checkpoint: 36db97269a69
Entire-Checkpoint: 93066e1dac3c
Entire-Checkpoint: 4fdb72622b7f
Contributor
Author
|
bugbot run |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Root-level tasks overwritten by per-session tasks splice
copyTaskMetadataToV2now merges root-level and latest-session task trees before splicing so both task sets are preserved instead of one overwriting the other.
Or push these changes by commenting:
@cursor push 033e8c1ad0
Preview (033e8c1ad0)
diff --git a/cmd/entire/cli/migrate.go b/cmd/entire/cli/migrate.go
--- a/cmd/entire/cli/migrate.go
+++ b/cmd/entire/cli/migrate.go
@@ -359,14 +359,18 @@
return err
}
+ latestSessionIdx := -1
+ if len(summary.Sessions) > 0 {
+ latestSessionIdx = len(summary.Sessions) - 1
+ }
+
// Legacy v1 layout stores task metadata at checkpoint root: <cp>/tasks/<tool-use-id>/...
- // Prefer attaching this tree to the latest session in v2.
- if rootTasksTree, rootTasksErr := v1Tree.Tree("tasks"); rootTasksErr == nil {
- if len(summary.Sessions) > 0 {
- latestSessionIdx := len(summary.Sessions) - 1
- if spliceErr := spliceTasksTreeToV2(repo, v2Store, cpID, latestSessionIdx, rootTasksTree.Hash); spliceErr != nil {
- return fmt.Errorf("latest session task tree splice failed: %w", spliceErr)
- }
+ // Attach this to the latest session in v2, and merge with that session's own tasks if present.
+ var rootTasksTree *object.Tree
+ rootTasksSpliced := false
+ if latestSessionIdx >= 0 {
+ if tasksTree, rootTasksErr := v1Tree.Tree("tasks"); rootTasksErr == nil {
+ rootTasksTree = tasksTree
}
}
@@ -382,11 +386,33 @@
continue // No tasks directory in this session
}
- if spliceErr := spliceTasksTreeToV2(repo, v2Store, cpID, sessionIdx, tasksTree.Hash); spliceErr != nil {
+ tasksTreeHash := tasksTree.Hash
+ if rootTasksTree != nil && sessionIdx == latestSessionIdx {
+ mergedTasksTreeHash, mergeErr := checkpoint.UpdateSubtree(
+ repo,
+ rootTasksTree.Hash,
+ nil,
+ tasksTree.Entries,
+ checkpoint.UpdateSubtreeOptions{MergeMode: checkpoint.MergeKeepExisting},
+ )
+ if mergeErr != nil {
+ return fmt.Errorf("failed to merge root and session task trees for session %d: %w", sessionIdx, mergeErr)
+ }
+ tasksTreeHash = mergedTasksTreeHash
+ rootTasksSpliced = true
+ }
+
+ if spliceErr := spliceTasksTreeToV2(repo, v2Store, cpID, sessionIdx, tasksTreeHash); spliceErr != nil {
return fmt.Errorf("session %d task tree splice failed: %w", sessionIdx, spliceErr)
}
}
+ if rootTasksTree != nil && !rootTasksSpliced {
+ if spliceErr := spliceTasksTreeToV2(repo, v2Store, cpID, latestSessionIdx, rootTasksTree.Hash); spliceErr != nil {
+ return fmt.Errorf("latest session task tree splice failed: %w", spliceErr)
+ }
+ }
+
return nil
}
diff --git a/cmd/entire/cli/migrate_test.go b/cmd/entire/cli/migrate_test.go
--- a/cmd/entire/cli/migrate_test.go
+++ b/cmd/entire/cli/migrate_test.go
@@ -228,6 +228,52 @@
require.NoError(t, taskFileErr, "expected migrated task checkpoint metadata in /full/current")
}
+func TestMigrateCheckpointsV2_TaskMetadataMergesRootAndSessionTasks(t *testing.T) {
+ t.Parallel()
+ repo := initMigrateTestRepo(t)
+ v1Store, v2Store := newMigrateStores(repo)
+
+ cpID := id.MustCheckpointID("c1d2e3f4a5b6")
+
+ metadataDir := t.TempDir()
+ sessionTaskFile := filepath.Join(metadataDir, "tasks", "toolu_01SESSION", "checkpoint.json")
+ require.NoError(t, os.MkdirAll(filepath.Dir(sessionTaskFile), 0o755))
+ require.NoError(t, os.WriteFile(sessionTaskFile, []byte(`{"source":"session"}`), 0o644))
+
+ // Write one v1 task checkpoint that has both:
+ // 1) root-level task metadata (legacy layout, from IsTask/ToolUseID)
+ // 2) session-level task metadata (from MetadataDir copy into session subtree)
+ err := v1Store.WriteCommitted(context.Background(), checkpoint.WriteCommittedOptions{
+ CheckpointID: cpID,
+ SessionID: "session-task-merge-001",
+ Strategy: "manual-commit",
+ Transcript: []byte("{\"type\":\"assistant\",\"message\":\"task merge\"}\n"),
+ Prompts: []string{"task merge prompt"},
+ IsTask: true,
+ ToolUseID: "toolu_01ROOT",
+ MetadataDir: metadataDir,
+ AuthorName: "Test",
+ AuthorEmail: "test@test.com",
+ })
+ require.NoError(t, err)
+
+ var stdout bytes.Buffer
+ result, migrateErr := migrateCheckpointsV2(context.Background(), repo, v1Store, v2Store, &stdout)
+ require.NoError(t, migrateErr)
+ assert.Equal(t, 1, result.migrated)
+
+ _, rootTreeHash, refErr := v2Store.GetRefState(plumbing.ReferenceName(paths.V2FullCurrentRefName))
+ require.NoError(t, refErr)
+ rootTree, treeErr := repo.TreeObject(rootTreeHash)
+ require.NoError(t, treeErr)
+
+ // Both root-level and per-session tasks must exist after migration.
+ _, rootTaskErr := rootTree.File(cpID.Path() + "/0/tasks/toolu_01ROOT/checkpoint.json")
+ require.NoError(t, rootTaskErr, "expected root-level task metadata in /full/current")
+ _, sessionTaskErr := rootTree.File(cpID.Path() + "/0/tasks/toolu_01SESSION/checkpoint.json")
+ require.NoError(t, sessionTaskErr, "expected session-level task metadata in /full/current")
+}
+
func TestMigrateCheckpointsV2_AllSkippedOnRerun(t *testing.T) {
t.Parallel()
repo := initMigrateTestRepo(t)This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit d7e367f. Configure here.
pfleidi
reviewed
Apr 7, 2026
pfleidi
reviewed
Apr 7, 2026
Entire-Checkpoint: 51d95c3209d7
pfleidi
approved these changes
Apr 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Adds a (hidden) migration CLI command that allows for a
checkpointsparameter to be passed in to migrate from"v1"to"v2". I think this will be automated for end users when we are ready to green light v2, but for now, it's handy for testing.Follows the
migratecommand validation for a test repo as written in https://github.com/entireio/cli/pull/839/changes#diff-f8101f182954049d140980fd56caf1e09e5c85a771f21c972d986ce2229d7e6eR439.Validated that for checkpoints with a transcript.jsonl in v1, the full.jsonl + transcript.jsonl files were created in the proper places for v2.
Rerunning the command skips checkpoints that are already migrated + shows which transcript.jsonl files couldn't be created.
Inspecting an example transcript.jsonl file:
Note
Medium Risk
Introduces new migration logic that writes to v2 checkpoint refs and performs git tree/commit surgery, which could affect checkpoint data integrity if bugs exist; command is hidden and guarded by an explicit flag, limiting user impact.
Overview
Adds a hidden
entire migrate --checkpoints v2command to bulk-migrate committed checkpoints from v1 storage into the v2 refs.Migration iterates v1 checkpoints, writes each session into v2 (optionally generating
transcript.jsonlvia compaction), and is idempotent by skipping already-migrated checkpoints while backfilling missing compact transcripts when possible. For task checkpoints, it also copies task metadata trees into v2/full/currentvia subtree updates and commits.Separately standardizes prompt serialization by introducing
PromptSeparator,JoinPrompts, andSplitPromptContent, switching existing v1/v2 checkpoint writers to use the shared join helper and adding focused tests for prompt round-tripping and migration behavior.Reviewed by Cursor Bugbot for commit d7e367f. Configure here.